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Abstract — We consider the problem of communicating over a 
multiple-input multiple-output (MIMO) real valued channel for 
which no mathematical model is specified, and achievable rates 
are given as a function of the channel input and output se- 
quences known a-posteriori. This paper extends previous results 
regarding individual channels by presenting a rate function for 
the MIMO individual channel, and showing its achievability in 
a fixed transmission rate communication scenario. 

I. Introduction 

We consider a channel, termed an individual channel, where 
no specific probabilistic or mathematical relation between 
the input and the output is assumed. This channel is an 
extreme case of an unknown channel. Achievable rates are 
characterized using only the input and output sequences, which 
capture the actual (a-posteriori) channel behavior. This point of 
view is similar to the approach used in universal source coding 
of individual sequences where the goal is to asymptotically 
attain for each sequence the same coding rate achieved by the 
best encoder from a model class, tuned to the sequence. This 
framework is an evolution of those considered in Shayevitz 
and Feder [1] and Eswaran et. al. |2| and is presented in more 
detail in our papers |3| and |4|, together with the relevant 
background. We will give a brief introduction below. 

The setting we consider includes a single encoder receiving 
a message to transmit and emitting symbols X{ G X,i = 
1,2, ..,n and a decoder receiving a sequence of symbols 
Vi £ y,i = 1,2, ..,n and attempting to reconstruct the 
message. In the present paper the input and output symbols 
are real-valued vectors, i.e. X = R* and y = R r . The 
relation between x = [x\, x n ] T and y = [yi, y n ] T 
is unknown to the encoder and decoder. We consider two 
communication scenarios: with feedback (possibly imperfect) 
and without feedback. For the case in which there is no 
feedback the communication system transmits in a constant 
rate, and outage is unavoidable, i.e. one cannot guarantee a 
small probability of error in all circumstances. In the case 
feedback exists, the communication rate may be dynamically 
adapted and outage may be prevented. In both cases we assume 
common randomness exists in the encoder and the decoder. 
The results in the current paper extend the previous results, 
yet only for the first case, of transmission in a constant rate. 

The performance is measured by a rate function R emp : 
X n x y n — > R representing an empirical measure of the 
achievable rate between the channel input and channel output, 
over n channel uses. In examples here and in 01U, R cmp 



can be viewed as the mutual information achieved in a certain 
family of statistical models (in the current scope, all zero mean 
Gaussian channels), when the model parameters match the 
empirical ones. In communication without feedback we say 
that a given rate function i? emp (x, y) is achievable with an 
input distribution Q(x) if for large block size n — > oo, it 
is possible to communicate at any rate R and an arbitrarily 
small error probability is obtained whenever i? omp (x, y) > R. 
The communication system is required to emit blocks with 
probability distribution Q(x), which is possible due to the 
use of randomization. By placing this additional constraint we 
leave aside the question of adapting the input distribution, so 
that the current framework attempts at achieving the empirical 
"mutual information" rather than the empirical "capacity". 
Another reason for the fixed prior is avoiding degenerate 
systems which may transmit only "bad" sequences with low 
(or zero) i? e mp- This constraint is further discussed in 
section VIII.C. 

The main result of this paper is that for the multiple- 
input multiple-output (MIMO) channel R* R r (i.e. with 
t transmit and r receive antennas) the rate function defined 
below is asymptotically achievable, in the fixed rate case: 
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where the n x t matrix X denotes the channel input over n 
symbols, and the n x r matrix Y denotes the output, tlxx = 
±X T X, R YY = iY T Y and K (XY)(XY) = i[XY] T [XY] 
are the input, the output and the joint empirical correlation 
matrices, respectively. This is a generalization of the result 
of where the rate function R cmp = | log ( ^(^a ) 
was proved to be achievable for real valued SISO channel 
R —> R (p denotes empirical correlation). As in 0, the 
proof is geometrically intuitive. The results easily extend to the 
complex MIMO case, and to rate function using the empirical 
covariance (rather than the correlation), but we focus here on 
the simpler case. 

The paper is organized as follows: in Section [II] we explain 
the motivation for this rate function and its relation to the 
probabilistic Gaussian channel, in Section [Til] we present in 
detail the main result, which is proven in Section [IV] Section 
Mis devoted to comments and further research items. 

We use lowercase boldface letters to denote vectors, and 
uppercase boldface letters to denote matrices. We use the same 



notation for random variables and their sample values, and the 
distinction should be clear from the context. 

II. Origin of the rate function 

Consider the channel from x £ R' to y € K r which are real 
valued vectors. For the additive white Gaussian noise (AWGN) 
MIMO channel y = Hx + v with v ~ Af(0, er 2 I), and x ~ 
7V(0, 1) it is well known that the mutual information is 



7 ( x ; y) = 2 lo s 



1 rp 

I+— H T H 



(2) 



see for example |S)|6]. This reflects the maximum achievable 
rate with the fixed covariance matrix Exx T = I, and is some- 
times termed the open-loop MIMO capacity, since equal power 
is a reasonable choice when the transmitter does not know the 
channel. A more general form of the mutual information is 
obtained by assuming x, y are any jointly Gaussian random 
vectors and writing: 



h(x) = — log |27re • cov(x) 
h(y) = ilog|27re-cov(y)| 
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Therefore: 



J(x;y) =/i(x)+ft(y)-ft(x,y) 



1 



log 



c)| ■ |cov(y) 
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(6) 

where the factors 2ire cancel out since the dimension of 
the covariance matrix in the denominator is the sum of the 
dimensions in the nominator. The expression (|6]i is more 
general than (O since it does not assume the noise is white, 
and is suitable for our purpose since it expresses the mutual 
information through properties of the input and output vectors 
without using an explicit channel structure. For the case of the 
AWGN MIMO channel it yields the same value as (ffji. For the 
particular scalar case where x, y are scalars with variances 
a\, <7y and correlation factor p, Equation (O evaluates to 



as previously obtained for the SISO 



> x < U Y 

/(x;y) 
case. 

The empirical rate function we defined in (fTJ is an empirical 
version of the mutual information expression in @, except that 
the covariance matrices are replaced by empirical correlation 
(rather than covariance) matrices, i.e. we do not cancel the 
mean. When iRxxl = or |Ryy = (which leads also 



to |R 



(XY)(XY)\ 



0) , the rate function will be defined by 



removing the columns of X or Y (respectively), which are lin- 
early dependant on the others, until these determinants become 
positive. It is not important which columns are removed to 
break the linear dependence, due to this function's invariance 
to linear transformation (Property [2] below). For the case of 



Y = or X = we define R c 



0. 



The rate function has the following properties which are ex- 
pected from an empirical metric of the "mutual information": 



1) Non-negativity: i? cmp (X, Y) > 0. This is evident from 
the fact i? omp (X, Y) is the mutual information between 
two Gaussian vectors with the respective covariances. It 
will also be shown in passing as part of the derivation 
in Section |IV] 

2) Invariance under linear transformations: Any in- 

vertible linear matrix operation on the input or output 
(for example, multiplying any of the input or output 
signals by a factor, adding signals, etc) does not change 

-Remp(X, Y), i.e. fiemptXGuYGj) = _R cmp (X, Y). 

Proof: Suppose we multiply X and Y by arbitrary 
matrices G Xytx t an d G yrxr respectively. Define X' = 



XG X then 
X,Y1 
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then from the same consider- 
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IGJ , therefore 



the factors cancel out and i? cmp (X', Y') = i? omp (X, Y) 
3) Symmetry: i? cmp (X,Y) = i? cmp (Y,X) 

III. The main result 

Theorem 1 (Non-adaptive, continuous MIMO channel). 

Given the channel K* — > W, define the input over n symbols 
as an n x t matrix X, and the output as an n x r matrix 



Y. Let R 



xx 



iX T X, Ryy = -Y T Y and R 



(XY)(XY) 



i[XY] T [XY] be the input, the output and the joint empirical 
correlation matrices, respectively. Define the rate function 



^?emp(X, Y) 
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Then for every P e > 0, a positive definite t xt matrix A x and 
n > t + r there exists random encoder-decoder pair of rate 
R over block size n, such that the distribution of the input 
sequence is X <~ N n {Q,h. x ) and for any 7 < 1 — the 
probability of error for any message given an input sequence 
X and output sequence Y is not greater than P e if: 



fl<7-#emp(X,Y) 

where 



log(C L ) 
"(8) 



1 
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Specifically, for every d > and 7 < 1 there exists n large 
enough so that the probability of error is not greater than P e 

i?<7-^cm P (X,Y) -6 (10) 

The theorem almost directly follows from the next lemma 
which we will prove subsequently: 



Lemma 1. For any n x r matrix Y, the probability of 
i? cmp (X, Y) > T where X is randomly drawn X ~ 
A/" ra (0, A x ) is bounded by: 

Pr{i? cmp (X, Y) >T}<C L - cxp („ 7 . „ . T ) {U) 

For any 7 in the range < 7 < 1 — t+r r ~ 1 , and where Cl is 
defined in ((9]). 

Note that the bound does not depend on A^. To prove 
Theorem Q] the codebook {X^n}m=i R ^ is randomly gener- 
ated by i.i.d. selection of each codeword from the Gaussian 
matrix distribution Af n (0, A x ), The common randomness is 
the codebook itself. The encoder sends the w-th codeword, 
and the decoder uses maximum empirical rate decoder i.e. 
chooses: 

w = argmax{i? omp (X m ; Y)} (12) 

where ties are broken arbitrarily. By using Lemma Q] and the 
union bound, the probability of error given ~K W , Y is bounded 
by: 

PW(X W) Y) < 

<Pr^ (J (i?em P (X m ;Y) Y)) X w \ < 

< exp(ni?) • C L • n'^ 2 ! exp(- 7 • n • i? cmp (X tu ; Y)) = 



C, 



,t\r/2\ 



cxp[n(R - 1 ■ R cmp (X w ;Y))} (13) 

Therefore if © is satisfied, then Pi w) (X w , Y) < P e , which 
proves the first part of the theorem. The second part follows 
directly from the first part. For any 7 < 1 and S > there 
is n large enough so that the condition 7 < 1 — t+r ,^ 1 is 
satisfied, and then n could be increased till the redundancy in 
®, - t\r/2] ^fM _ l2fi£M wou i d b e sma ller than S 

(note that Cl is decreasing in ri), therefore P e (X u , , Y) < 
P e will be satisfied if ( TTOb is satisfied. □ 

IV. Proof of Lemma 1 
To prove Lemma Q] we use the Chernoff bound: 

Pr{P omp (X,Y) >T} = 

= Pr{exp(n7i? cmp (X, Y)) > exp(n 7 T)} < 

E exp(n7_R cmp (X, Y)) ^ icxp( _ n7T) (M) 



< 



exp(n7T) 

To prove the lemma we need to calculate 



L = J Bexp(n7 J R emp (X, Y)) = E 
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(15) 

where the expected value is taken with respect to X. The 
remainder of this section is devoted to upper bounding L. We 
will first assume that A x — I txt , i.e. X ~ A/" n (0,I), and then 
extend to general A^. 

Define Z = [Y,X]. We perform a QR decomposition of 
X, Y and Z in order to obtain more friendly expressions. As a 



reminder, QR decomposition of a matrix A„ X fc = QnxfcRfcxfc 
(with Q T Q = / and R upper triangular) is performed by 
Gram-Schmidt process. We start from the left column of A and 
work our way to the last one. At each time we take a column of 
A and split it to the part which can be represented by a linear 
combination of the columns to the left of it (equivalently, to 
the columns of Q already generated), and the "innovation", 
i.e. the part which is orthogonal to the subspace generated by 
the previous columns. The vector representing the innovation 
is normalized, and becomes the respective column of Q, and 
its power becomes the diagonal element in R. The coefficients 
representing the part of the vector which is in the subspace 
of previous columns become the elements of R above the 
diagonal. Another important property of QR decomposition is 
that the determinant of A T A can be written in terms of the 
diagonal elements in R: | A T A| = |R T Q T QR| = |R T R| = 

iRi 2 = nt!^- 

Now define the diagonal of the upper triangular matrix in the 
QR decomposition of the matrices X, Y and Z respectively to 
be the vectors a, b and [c, d]. I.e. if X — Q^R^, Y = Q y R y 
and Z = Q z Rj; then a = diag(R^), b = diag(R 3/ ), and 
[c,d] = diag(R 2 ). The lengths of the vectors c, d are r, t 
respectively, so that they overlap with the columns of Y and 
X in the matrix Z. We have: 
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(16) 

Note that the — factors cancel out because the matrix dimen- 

71 

sions are t and r in the nominator and t+r in the denominator. 
Since the Gram-Schmidt process operates sequentially from 
the first column to the last, and the first r columns of Z and 
Y are equal, we will have b = c. Therefore we can write: 



R 



xx 



R 



YY 



n 



(17) 



R(xy)(xy) 

Note that a,i and di both relate to the same vector, the i-th 
column of X. The ratio ^ is the ratio between the innovation 
of the i-th column of X with respect to the subspace spanned 
by previous columns of X alone (nominator) or these columns 
together with the columns of Y (denominator). Obviously 
from this reason \di\ < \ai\ (and therefore i£ emp (X, Y) > 
- Property [T). 

The key observation in this derivation is as follows: consider 
a sequential drawing of the columns of X and calculation of 
the factors Since the z-th column of X is chosen isotrop- 
ically and independently of the previous columns, the value 
of previous columns does not affect the distribution of the 
innovations di , «j (only the number of dimensions in previous 
columns does). Using this observation which we will prove 
subsequently, we would be able to break L represented as the 
expected value of a product dPTl ) into a product of expected 
values (equations (fT9l)-(l20b). and the proof is completed by a 
(tedious) calculation of these expected values. 



To show the independence of aj,dj in previously drawn 
values, denote by XJ n a matrix including the columns m to 
i of X, and by Xj the i-th column. Define a a unitary n x n 
matrix Q whose first i— 1 columns span the subspace spanned 
by the first £ — 1 columns of X, its next r columns extend this 
subspace to cover also the columns subspace of Y, and the 
next n — (i — 1) — r columns complete it to an orthonormal 
basis. This matrix does not depend on X* and specifically on 
the column i. We assume that the columns of Y are linearly 
independent (we will relax this assumption later). Also, in 
probability one, assuming n > t + r, the columns of X| 
are linearly independent of each other and of the columns of 
Y. To see this, it is easy to show that the projection of each 
column in any direction orthogonal to the subspace already 
spanned by previous ones (including Y), is also Gaussian 
therefore has probability to be 0, as long as there exists such 
an orthogonal vector, i.e. the number of previously generated 
vectors is smaller than n. 

Now define z = Q T x;. Since Xj ~ J\f (0,I nxn ) also 
z ~ J\f(0,I nxn ). The first i — 1 elements of z represent the 
projections of x, to the subspace spanned by previous columns 
of X, and the next r elements represent the projections to the 
subspace spanned by columns of Y. So af collects the energy 
of all elements except the first i — and df collects the energy 
of all elements except the first i — l+r. To see this formally, 
in the Gram-Schmidt process the coefficients of the projection 



are lT Xj 



and 



of Xj on the subspace spanned by X'! 

T 

the projection itself is Q|~ Q%~ Xj, therefore the innovation 

v; = and 

||Q T v,f = 

1 and similarly, 



Vi = Xj - Qi l Q\ 1 Xj. Since 
= Qp T Xj we have af = ||\ 
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Therefore 



- Qr i+r Qr i+r " 

Oj , di are independent of Y and the previous columns of X, 
and can be given by norms over parts of a Gaussian i.i.d. 
vector of length n. Defining 



Di = E 
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7 n 



Therefore by induction: 



L = E 
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(20) 

Now we bound Di (using the previously defined Gaussian 
vector z): 



7n/2 




sj 2 r / 2 r(§) 2 "" , 2 r+1 r( "- t - r+1 ) 
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v+l J \v+l 
dw ■ dv = ci / w i e~ w / 2 dw- 



(l-T)n-i + l 



dv (21) 



where in (a) we used independent Chi-Squared distributed h, s, 
and in (b) we changed variables from s, h to w = s + h,v 
s/h, with inverse transformation s = 

1 1 

1/h -s/h 2 

The first integral in the expression above evaluates to: 



(18) Jacobian J- 1 = ^ 



dw,v 

ds,h 



-ttW, h = —ttW and 

_ s+h _ (v+1) 2 
h' 2 w 



Where the equality is due to the independence shown above, 
we have for any k = 1, 2, t: 
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By definition of T(-). The second integral 

(1 f)n-i-r-l 

behaves like v 2 near v = and like 

(l_ 7) „_ i _ T ._l (i_ 7 )„_ i + i -r-2 

w 2 2 =7j2atv— )-oo. Therefore 

it will exist (converge) iff the power of v near is larger 
than —1 and at 00 is smaller than —1. The first condition is 

(1 - 7) V~ r ~ 1 >-!=*■ (1-7)" > i+r-1 7 < l- 1 ^- 
The other condition always holds since r > 0. Note that since 
the power of is larger by more than 1 than the power of 
v it is positive (when the first condition holds). Therefore we 
can bound: 



C v < 



1 



u=0 V.max(<y,l) y 
2 2 



(1— T)n-i + l 



• dv 
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(1 — 7)71 — i — r + 1 r (1 — 7)71 — £ — r + 1 r 

(23) 

Combining ( l2Tb . ( 1221 and d23l we obtain: 



r /r% r / n-»-r+l N ^ (1 _ _ f _ f + j 

(24) 

Since L results in a rate loss of ^logi, and T ( "~2 ) 
is superexponential in n, we would like to express more 
explicitly the dependence on n. Using Y(t + 1) = tr(t) with 
t = 



_ n-t+l-2i ■ _ 



= 1, 2, .. \r/2] we can obtain the bound 



r(J*=fti 



therefore 
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and 
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(1 — 7)n — i — r + 1 ry 

= C L -n trr/21 (27) 



Substituting the above into (fl4b proves Lemma Q] for A T = 
I. The two assumptions on the parameters of the problem we 
have made in order for L to be bounded are (a) n > t + r 
which was needed in order that each new column of X would 
not be spanned by the previous columns and the columns of 



Y in probability 1, and (b) Vi < t : 7 < 1 — 



i+r-l 



7 < 



1 — t+r ~ 1 , is needed for the existence of {-Dj}*=i- 

Suppose now that X ~ M n (0, A x ). Using the Cholesky de- 
composition we can define a coloring matrix W, W T W = A x 
so that X = W • X tt and X w ~ 7V"(0, 1). Since by Property 
12 the rate function is invariant to a linear transformation of 
X we would have R cmp (X. w , Y) = i? cmp (X, Y), therefore if 
Lemma Q] holds with respect to the white signal X» it also 
holds with respect to X. With regard to the assumption that the 
columns of Y are linearly independent: if they are not, then the 
rate function is defined with respect to a smaller matrix Y' nxr , 
containing only the independent columns. Comparing with a 
full rank Y, the random variables di increase (i.e. d\ > di) 
due to the smaller dimension of Y', therefore L' < L and the 
lemma still holds. □ 

V. Comments and extensions 

Comparison with the SISO case: Comparing Lemma Q] 
with Lemma 4 of Q for the SISO case r = t = 1, which 
is proven by a direct calculation, the bound here is slightly 
worse due to the limitation 7 < 1 — * +r ~ 1 = (n — l)/n which 
stems from the use of the Chernoff bound. 



Comparison with MEMO capacity: The scheme above 
achieves the mutual information of a Gaussian MIMO channel 
but not its capacity. Achieving the capacity requires adaptation 
of the input distribution, which for the known AWGN channel 
y = Hx + v is performed by SVD and water pouring 0. 
The strength of the scheme is in the lack of any assumptions 
about the probability distribution, which make it applicable for 
example for non Gaussian noise or one that depends on the 
transmitted signal. 

Exploiting temporal correlation: In the current results, as 
in previous ones J3|, the rate function depends on the zero 
order empirical probability, and lacks the ability to exploit 
temporal correlation. However the results can be used to 
exploit such correlation in the SISO or MIMO channel, in 
a crude way, by applying the scheme on blocks of k channel 
uses. The rate function over blocks is always superior to the 
single letter case, and the penalty is an increase in the fixed 
redundancy. 

Using empirical covariance instead of correlation: When 
the matrices R in (fl~|i are replaced with the empirical correla- 
tion C (where C x = n _1 ( x - ^- 1 1 T X) T (X - 7i~ 1 l T X)), 
the derivation is similar, except projection on an additional 
dimension (the all-ones vector) precedes the other projections. 
The results are the same with a loss of one dimension: 
7 < 1 — and n > t + r are required, and there is a 
small variation in Cl- 

The complex MIMO channel: The results easily extend to 
the complex-valued MIMO channel, using the same technique. 
The main difference is a double number of degrees of freedom 
in the derivation of Di, which doubles the rate compared to 
Equation Q] 

Adaptivity: In OJU we presented a communication 
scheme using a low rate feedback, which dynamically adapts 
the transmission rate and achieves the rate functions without 
outage. Such schemes are of higher practical interest. It is 
possible to show that the adaptive scheme of fl3] (4) achieves 
fl cmp of ([D up asymptotically vanishing redundancy, and up 
to a set of x sequences having vanishing probability. 
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